Improving Bitmap Index Compression by Data Reorganization
نویسندگان
چکیده
The volume of data generated by scientific applications through observations or computer simulations can reach to the order of the petabytes. This brings up the need for effective and compact indexing methods for efficient storage and retrieval of scientific data. Bitmap indexing has been successfully applied in this domain by exploiting the fact that scientific data are mostly read-only and enumerated or numerical. Bitmap indices can be compressed for efficient storage. In this paper, we study how to reorganize bitmap tables for improved compression rates. Our algorithms are used as a preprocessing step, thus there is no need to revise the current indexing techniques and the query processing algorithms. We propose Gray code ordering algorithm for this NP-Complete problem, which is an in-place algorithm, and runs in linear time in the order of the size of the database. We explore the effect of the order in which columns are evaluated in the Gray code ordering, to further improve the query execution time. Our experimental results on real data sets show that the compression ratio can be improved by a factor of 2 to 10 and the query execution times by a factor of 4 to 7.
منابع مشابه
Dynamic data organization for bitmap indices
Bitmap indices have been successfully used in scientific databases and data warehouses. Run-length encoding is commonly used to generate smaller size bitmaps that do not require explicit decompression for query processing. For static data sets, compression is shown to be greatly improved by data reordering techniques that generate longer and fewer runs. However, these data reorganization method...
متن کاملData Compression for Bitmap Indexes
Compression Ratio (CR) and Logical Operation Time (LOT) are two major measures of the efficiency of bitmap indexing. Previous works by [5, 9, 10, 11] compare the performance of bitmap compression schemes conducted separately on logical operation time and compression ratio. This paper will describe these works and recommend for consideration a new matrix – overall efficiency indicator. The overa...
متن کاملBitmap Indices for Data Warehouses
In this chapter we discuss various bitmap index technologies for efficient query processing in data warehousing applications. We review the existing literature and organize the technology into three categories, namely bitmap encoding, compression and binning. We introduce an efficient bitmap compression algorithm and examine the space and time complexity of the compressed bitmap index on large ...
متن کاملCompressed bitmap indices for efficient query processing∗
Bitmap indices are useful techniques for improving access speed of high-dimensional data in data warehouses and in large scientific databases. Even though the bitmaps are easy to compress, compressing them can significantly reduce the query processing efficiency. This is because the operations on the compressed bitmaps are much slower than the same operations on the uncompressed ones. To addres...
متن کاملA Survey of Bitmap Index-Compression Algorithms for Big Data
With the growing popularity of Internet applications and the widespread use of mobile Internet, Internet traffic has maintained rapid growth over the past two decades. Internet Traffic Archival Systems (ITAS) for packets or flow records have become more and more widely used in network monitoring, network troubleshooting, and user behavior and experience analysis. In this paper, we survey bitmap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006